Tagged with

multimodal learning

Explore machine learning papers and reviews related to multimodal learning. Find insights, analysis, and implementation details.

3

Papers Found

Back to all papers

Papers Related to multimodal learning

2023

Visual Instruction Tuning

Large Language Models Computer Vision Multimodal Learning Instruction Tuning Deep Learning

LLaVA paper: align LLMs with visual information through instruction tuning on image-text pairs, enabling multimodal understanding and reasoning.

Read review Original Paper

2023

BLIP-2: Efficient Vision-Language Pre-training

Computer Vision Natural Language Processing Deep Learning Multimodal Learning BLIP-2 Vision-Language Models

BLIP-2 leverages frozen image encoders and LLMs for efficient vision-language pre-training, achieving state-of-the-art multimodal performance.

Read review Original Paper

2021

CLIP: Visual Models via Language Supervision

Computer Vision Natural Language Processing Deep Learning Multimodal Learning CLIP

CLIP explained: contrastive learning on 400M image-text pairs enables zero-shot image classification and powerful vision-language understanding.

Read review Original Paper